Define Character Sets
SmartZone OCR allows one or more character sets to be defined using the CharacterSet class. SmartZone OCR provides an Add and Remove method to add or remove single or multiple characters in a string and/or pre-defined character sets to your current character set collection creating subsets as needed.
To see if a character is in the current Character Set, use the Contains method.
Language Support for Character Sets
If you have specified a language, it will be used to refine the contents of any character set containing alphabetic entries. For example, specifying a language of Italian and a character set of Alphanumeric would limit the returned results to only letters included in the Italian alphabet plus digits. |
You have the choice of multiple language support including the following:
Language | Description |
English | Is: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 ! " # % & ' ( ) * , - . / : ; ? @ [ \ ] _ { | } $ ¢ £ ¥ € + < = > |
French | Is the English character set plus: « » À Â Ç È É Ê Ë Î Ï Ô Ù Û Ü à â ç è é ê ë î ï ô ù û ü. |
Spanish | Is the English character set plus: « » ¡ ¿ Á É Í Ñ Ó Ú Ü á é í ñ ó ú ü |
Italian | Is the English character set plus: « » À È É Ì Ò Ù à è é ì ò ù |
German | Is the English character set plus: « » „ Ä Ö Ü ä ö ü ß |
Dutch | Is the English character set plus: À Á Â Ä Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ö Ù Ú Û Ü à á â ä ç è é ê ë ì í î ï ñ ò ó ö ù ú ü |
Portuguese | Is the English character set plus: À Á Â Ç È É Ê Í Ò Ó Ô Õ Ú Ü à á â ç è é ê í ò ó ô õ ú ü |
Norwegian | Is the English character set plus: « » Å Æ Ø å æ ø |
Finnish | Is the English character set plus: « » Å Ä Ö å ä ö |
Danish | Is the English character set plus: « » „ Å Æ Ø å æ ø |
Swedish | Is the English character set plus: « » Å Ä Ö å ä ö |
Western European | Is all the supported characters in English, French, Spanish, Italian, German, Dutch, Portuguese, Norwegian, Finnish, Danish, and Swedish. |
For best recognition accuracy results, set the character set to the narrowest set possible, including all possible returned values, then limit any possible returns by applying pre-defined character sets listed here. Character sets are used to limit (reduce) possible returned values once a universe of possible returned values is defined. For example, since é is not included in the English language, in order to accurately read the word "Résumé", you would need to specify a language that included é, such as French, since that includes all English letters plus é. You could then improve recognition further by omitting any other characters you do not expect to encounter. |
Predefined Character Sets
There are 11 additional pre-defined character sets available as properties:
# | Predefined Character Set | Description | ||
1 | AllAlphas | Includes all upper and lower case alpha characters. | ||
2 | AllCharacters |
Includes all upper and lower case alpha, all digits, punctuation, currency and arithmetic characters. | ||
3 | AlphaNumeric | Includes all upper and lower case alpha and digit characters. | ||
4 | Arithmetic | Includes all digits, arithmetic and arithmetic punctuation characters 0123456789+<=>%-.*/ | ||
5 | ArithmeticSymbols | Includes all arithmetic characters +<=> | ||
6 | Currency | Includes all digits, currency and currency punctuation characters 0123456789$¢€£¥,.'-= | ||
7 | CurrencySymbols | Includes all currency characters $¢€£¥ | ||
8 | Digits | Includes all digits as characters 0123456789 | ||
9 | LowerCase | Includes only lower case alpha characters abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêë | ||
10 | PhoneNumber |
Includes 0123456789-.+/EXText()
| ||
11 | Punctuation |
Includes only punctuation characters !"#%&'()*,-./:;?@[\]_{|}¡¿«»„ | ||
12 | UpperCase | Includes only upper case alpha characters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü |
Edit Character Sets
Optimal recognition results are obtained by using the character set that includes all and only the characters that potentially are encountered.
The ability to modify character sets into subsets to increase accuracy, confidence and speed is available in SmartZone OCR using the following methods:
- Add(String) - This method allows you to add any one or more characters to a current character set.
- Add(CharacterSet) - This method allows you to concatenate character sets as needed.
- Remove(String) - This method allows you to remove one or more characters from a character set.
- Remove(CharacterSet) - This method allows you to remove an entire character set from the current character set.